Skip to content

[model] use self attn in megatron for gated attn#624

Merged
zhuzilin merged 2 commits intomainfrom
feature/qwen3next
Oct 29, 2025
Merged

[model] use self attn in megatron for gated attn#624
zhuzilin merged 2 commits intomainfrom
feature/qwen3next

Conversation

@zhuzilin
Copy link
Contributor

@zhuzilin zhuzilin commented Oct 29, 2025

This could enable cp and tp for the gated attention part of Qwen3Next models. And is larged inspired by https://github.com/alibaba/Pai-Megatron-Patch

After this PR, we can use tensor parallel for qwen3next.

@zhuzilin zhuzilin marked this pull request as ready for review October 29, 2025 13:43
@zhuzilin zhuzilin merged commit 08118ec into main Oct 29, 2025
2 of 4 checks passed
llltttwww pushed a commit to llltttwww/slime that referenced this pull request Nov 30, 2025
Yangruipis pushed a commit to rednote-ai/slime that referenced this pull request Feb 28, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant